A Pipeline to Automate the Updating of a Specialized Protein Database

نویسندگان

Sanmay Das

Milton H. Saier

Charles Elkan

چکیده

Motivation: The growing number of specialized databases in molecular biology, coupled with the huge increase in the availability of molecular data, necessitates the development of automatic methods for finding and adding relevant information to these databases. Results: We show how a general protein database (Swiss-Prot) can be used as a source of data for a more specialized one (TCDB, the Transport Classification Database). First, we present a maximumentropy classification method trained on preprocessed Swiss-Prot records that achieves high precision and recall in determining which records are relevant to transmembrane transport in cross-validation experiments. Next, we describe a set of rules that can be used to further filter out proteins that are not novel, or not well characterized. Using both these pipeline stages, a human expert only has to examine about 2% of Swiss-Prot records for potential inclusion in

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Transport Proteins in a General Protein Database

The number of specialized databases in molecular biology is growing fast, as is the availability of molecular data. These trends necessitate the development of automatic methods for finding relevant information to include in specialized databases. We show how to use a comprehensive database (SwissProt) as a source of new entries for a specialized database (TCDB, the Transport Classification Dat...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

طراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value

In recent years, many efforts have been done to design ontology learning methods and automate ontology construction process. The ontology construction process is a time-consuming and costly procedure for almost all domains/applications, so automating this process is a solution to overcome the knowledge acquisition bottleneck in information systems and reduce the construction cost. In this artic...

متن کامل

Application of Fuzzy Fault Tree Analysis on Oil and Gas Offshore Pipelines

Fault Tree Analysis (FTA) as a Probabilistic Risk Assessment (PRA) method is used to identify basic causes leading to an undesired event, to represent logical relation of these basic causes in leading to the event, and finally to calculate the probability of occurrence of this event. To conduct a quantitative FTA, one needs a fault tree along with failure data of the Basic Events (BEs). Someti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

A Pipeline to Automate the Updating of a Specialized Protein Database

نویسندگان

چکیده

منابع مشابه

Finding Transport Proteins in a General Protein Database

Evaluation of Updating Methods in Building Blocks Dataset

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

طراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value

Application of Fuzzy Fault Tree Analysis on Oil and Gas Offshore Pipelines

عنوان ژورنال:

اشتراک گذاری